EU AI Act · Article 9 Compliance · LangGraph/LangChain Proposal

01 Regulatory Context

What Article 9 Actually Requires

⚖️ Regulatory Definition

Article 9 mandates that every high-risk AI system must have a continuously running, documented, and maintained risk management system spanning its full lifecycle — from design through post-market. It is not a one-time audit. It is an ongoing engineering process. Compliance deadline: August 2, 2026.

§1 — Establishment

Risk Management System Must Be Documented

A formal RMS must be established, implemented, documented and maintained. Not a single report — an operational, living system.

LangGraph has no built-in state for risk metadata or audit trails per run.

§2 — Continuous Lifecycle Process

Four-Step Iterative Loop

The system must: (a) identify/analyze risks to health, safety, fundamental rights, (b) estimate/evaluate those risks, (c) evaluate risks from post-market monitoring data, and (d) adopt targeted mitigation measures.

LangChain chains execute statelessly — no native risk context propagation between steps.

§3 — Scope Limitation

Only Mitigable Risks In Scope

Article 9 only covers risks that can be reasonably mitigated or eliminated through design, development, or adequate technical information to deployers.

Must define a risk taxonomy scoped to what the agent can actually control.

§4 — Interaction Effects

Consider Combined Requirements

Risk measures must consider interaction effects across the full set of requirements in Chapter III — not each requirement in isolation. Balance, don't over-engineer.

Multi-node LangGraph graphs create emergent behaviors — risk must be assessed at the graph level.

§5 — Residual Risk Acceptability

Residual Risk Must Be Judged Acceptable

After mitigation, the residual risk of each hazard and overall system residual risk must be judged acceptable. Requires a formal acceptability determination framework.

No LangChain/LangGraph native concept of "risk score" or "acceptability threshold" per invocation.

§6 — Mandatory Testing

Pre-Deployment Testing Against Defined Metrics

High-risk AI must be tested throughout development and before deployment. Testing must use predefined metrics and probabilistic thresholds appropriate to the intended purpose.

LangChain evals exist but aren't tied to formal regulatory risk thresholds or documented per §6 requirements.

§7 — Timing of Testing

Testing at Any Point + Pre-Launch Mandatory

Testing must happen as appropriate at any time during development, and without exception before placing the system on the market or putting it into service.

CI/CD pipelines for LangGraph agents rarely include regulatory risk testing gates.

§8 — Post-Market Monitoring

Continuous Feedback Loop from Live Data

The RMS must incorporate post-market monitoring data (per Article 61) to re-evaluate and update risks. The system is never "finished."

No standard pattern for feeding LangGraph production traces back into a risk re-evaluation loop.

§9 — Vulnerable Groups

Special Consideration: Minors and Vulnerable Users

When the system's intended purpose may impact persons under 18 or other vulnerable groups, providers must give specific consideration to adverse impact vectors.

User-segment risk profiling must be baked into the agent's invocation context.

§10 — Sectoral Integration

May Merge With Existing RMS Under Other EU Law

If the organization already has risk management processes mandated by other EU laws (finance, medical, etc.), Art. 9 requirements can be integrated into those procedures.

Opportunity: build a composable RMS layer that plugs into existing compliance frameworks.

02 Gap Analysis

Why LangGraph/LangChain Don't Natively Comply

🔴 Critical Gap

No Persistent Risk State

LangGraph state checkpointing is functional, not regulatory. There is no native concept of a "risk event," a "mitigation decision," or a "residual risk score" per graph execution.

🔴 Critical Gap

No Audit Trail Standard

LangChain callbacks log execution events, but do not produce structured documentation that satisfies Article 9 §1's requirement for a maintained risk management system.

🔴 Critical Gap

No Pre-Defined Risk Metrics

LangSmith evaluators measure performance (quality, latency, accuracy) but are not structured around Article 9 §6's "prior defined metrics and probabilistic thresholds" for risk categories.

🟡 Significant Gap

No Post-Market Feedback Loop

Production traces in LangSmith are not automatically routed into a risk re-evaluation system. §8 requires this loop to be operational, not optional.

🟡 Significant Gap

No Graph-Level Risk Composition

LangGraph nodes are individually testable, but §4 requires risk assessment at the combined application level. Multi-agent graphs have emergent risk that node-level testing misses.

🟢 Manageable

Human-in-the-Loop Hooks Exist

LangGraph has interrupt/approval node patterns. These can be mapped to Article 14 (human oversight) requirements and partly address §5 residual risk acceptability via human review gates.

03 Solution Proposal

The Proposed Compliance Architecture

💡 Core Thesis

Build a Risk Management Middleware Layer that wraps your LangGraph agent. It adds four capabilities — risk context injection, structured event logging, evaluation gates, and post-market feedback routing — without modifying your core agent graph logic. Think of it as a compliance sidecar.

High-Level System Architecture — Article 9 Compliance Layer

🏗️

Input

Risk Context Injector

🔍

Runtime

LangGraph Agent

🛡️

Intercept

Risk Event Bus

📊

Evaluate

Risk Scoring Engine

🗄️

Persist

RMS Data Store

🔁

Feedback

Post-Market Monitor

04 Core Components

Four Pillars of the Solution

🏷️

Risk Context State

Addresses §1, §2, §3, §9

Extend LangGraph's StateGraph schema with a mandatory risk_context field. This carries risk metadata — user segment, hazard classes, active mitigations, invocation purpose — through every node. Makes risk a first-class citizen of agent state, not an afterthought.

📝

Structured RMS Logger

Addresses §1, §2d, §7, §8

A LangChain callback handler that intercepts every node transition, tool call, and LLM invocation, emitting structured risk events to a persistent store. Events include hazard ID, mitigation applied, residual risk score, and operator decision. Produces the documented audit trail Article 9 demands.

🧪

Regulatory Eval Suite

Addresses §6, §7

A pre-deployment test suite built on LangSmith evaluators, structured around pre-defined risk metrics and probabilistic thresholds (not just accuracy). Tests include: harmful output rate, fundamental-rights proxy scores, bias probes, and consistency-under-distribution-shift. Blocks deployment if thresholds breached.

🔁

Post-Market Monitor

Addresses §2c, §8

A production feedback pipeline that ingests LangSmith traces, flags anomalous risk events, and routes them back to the risk identification step. Triggers automatic risk re-evaluation when drift is detected. Closes the lifecycle loop that §2(c) and §8 explicitly require.

05 Technical Specification

Component-Level Build Specification

RiskStateSchema

LangGraph · State Extension

Extend TypedDict-based LangGraph state with required fields: risk_context (hazard_classes, user_segment, intended_purpose), active_mitigations (list of applied controls), risk_events (append-only list), residual_risk_score (float, updated by scoring node). Add a dedicated risk_assessment_node that runs at graph entry and after any tool call with side effects.

TypedDict StateGraph Annotated[list, operator.add]

Art. 9 §1 Art. 9 §2 Art. 9 §9

RiskEventCallback

LangChain · BaseCallbackHandler

Subclass BaseCallbackHandler (or AsyncCallbackHandler). Override on_llm_end, on_tool_end, on_chain_end to emit structured RiskEvent objects to a message queue or database. Each event carries: timestamp, node_id, hazard_id, mitigation_applied, residual_score, run_id (for traceability). Store events in append-only log (PostgreSQL / DynamoDB / BigQuery) for regulator access.

BaseCallbackHandler RiskEvent(dataclass) append-only log

Art. 9 §1 Art. 9 §7 Art. 12

HazardTaxonomy

Config · Risk Classification

A YAML/JSON-defined taxonomy of hazard classes relevant to your agent's domain. Each hazard entry carries: hazard_id, description, fundamental_rights_vector (which EU rights could be affected), likelihood_prior, severity_rating, and linked mitigation controls. This taxonomy is the documented output of the §2(a) "identification and analysis" step and must be versioned alongside code.

hazard_id: HAZ-001 severity: HIGH mitigations: [MIT-003]

Art. 9 §2a Art. 9 §2b Art. 9 §3

ResidualRiskScorer

LangGraph · Risk Node

A graph node (or callable injected into the should_continue condition) that computes a composite residual risk score after mitigation controls have been applied. Compares score against pre-defined acceptability thresholds (set per intended purpose per §6). If residual risk exceeds threshold, routes to a human_review_node (interrupt) before execution continues. Satisfies §5 residual risk acceptability requirement.

conditional_edge interrupt() risk_threshold.yaml

Art. 9 §4 Art. 9 §5 Art. 14

RegulatoryEvalSuite

LangSmith · CI/CD Gate

A LangSmith evaluator suite with custom metrics tied to the hazard taxonomy: (1) HarmfulOutputRate — proportion of runs triggering HAZ-class events, (2) FundamentalRightsProxy — LLM-graded assessment of output against a rights rubric, (3) BiasProbe — structured demographic parity tests, (4) ConsistencyUnderDrift — re-run with distribution-shifted inputs. Suite is run in CI/CD. Deployment is blocked if thresholds breach. Results are documented as §6/§7 evidence.

run_on_dataset() EvaluationResult threshold_gate.py

Art. 9 §6 Art. 9 §7 Art. 15

PostMarketMonitor

Async Pipeline · §8 Loop

An async service (cron-based or event-driven) that pulls production RiskEvent logs, aggregates anomaly signals, and triggers a risk re-evaluation run when drift is detected — e.g. HarmfulOutputRate exceeds baseline by >2σ. Writes a timestamped risk re-evaluation report to the RMS datastore, closing the §2(c)/§8 feedback loop. Implements the Article 9 requirement that the RMS is "regularly reviewed and updated."

drift_detector.py anomaly_threshold risk_re_eval_report

Art. 9 §2c Art. 9 §8 Art. 72

06 Reference Implementation

Skeleton: Risk-Aware LangGraph State

# risk_state.py — Article 9 §1, §2 compliant state schema
from typing import Annotated, TypedDict, List
import operator
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class RiskEvent:
    hazard_id: str          # e.g. "HAZ-001-BIAS"
    triggered_at: datetime
    node_id: str
    mitigation_applied: str # e.g. "MIT-003-FILTER"
    residual_score: float   # 0.0 (no risk) → 1.0 (max risk)
    operator_reviewed: bool = False

@dataclass
class RiskContext:
    intended_purpose: str
    user_segment: str       # e.g. "general_public", "vulnerable_minor", "professional"
    hazard_classes: List[str] = field(default_factory=list)
    active_mitigations: List[str] = field(default_factory=list)

class Article9State(TypedDict):
    # Your existing agent state fields:
    messages: Annotated[list, operator.add]
    
    # Article 9 §1 — documented risk management fields:
    risk_context: RiskContext
    risk_events: Annotated[List[RiskEvent], operator.add]  # append-only log
    residual_risk_score: float                              # §5 — must be judged acceptable
    risk_acceptance_status: str                             # "pending" | "accepted" | "rejected"
    rms_run_id: str                                         # links to RMS datastore record
  

# risk_callback.py — Article 9 §1 documented maintenance, §7 pre-deployment evidence
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.outputs import LLMResult

class RiskEventCallback(BaseCallbackHandler):
    def __init__(self, rms_store, hazard_classifier):
        self.rms_store = rms_store
        self.hazard_classifier = hazard_classifier
    
    async def on_llm_end(self, response: LLMResult, **kwargs):
        # Classify output against hazard taxonomy (Art. 9 §2a)
        hazards = await self.hazard_classifier.classify(response.generations)
        
        for hazard in hazards:
            event = RiskEvent(
                hazard_id=hazard.id,
                triggered_at=datetime.utcnow(),
                node_id=kwargs.get("run_id"),
                mitigation_applied=hazard.auto_mitigation,
                residual_score=hazard.residual_score,
            )
            # Persist to append-only RMS store — satisfies §1 "maintained" requirement
            await self.rms_store.append_event(event)
  

07 Compliance Mapping

Article 9 Paragraph → Solution Component Mapping

Article 9 Paragraph	Requirement Summary	Solution Component
§1 — Establishment	Documented, implemented and maintained RMS	RiskEventCallback + RMS DataStore
§2(a) — Identify	Identify/analyze known and foreseeable risks	HazardTaxonomy (versioned YAML)
§2(b) — Estimate	Estimate and evaluate risks	ResidualRiskScorer node
§2(c) — Post-Market	Evaluate risks from post-market monitoring data	PostMarketMonitor pipeline
§2(d) — Mitigate	Adopt targeted risk management measures	RiskContext.active_mitigations + HazardTaxonomy mitigations
§3 — Scope	Only mitigable risks in scope	HazardTaxonomy scope definition + risk classification filter
§4 — Interaction Effects	Consider combined application of requirements	Graph-level RiskScorer (not node-level); composite scoring formula
§5 — Residual Risk	Residual risk must be judged acceptable	ResidualRiskScorer + acceptability_threshold.yaml + human interrupt
§6 — Testing	Test against predefined metrics and thresholds	RegulatoryEvalSuite (LangSmith)
§7 — Testing Timing	Test throughout dev; mandatory before deployment	CI/CD gate blocking deployment on threshold breach
§8 — Continuity	Regularly reviewed, updated with live data	PostMarketMonitor + drift-triggered re-evaluation
§9 — Vulnerable Groups	Consider impact on minors and vulnerable users	RiskContext.user_segment field + segment-specific hazard weights
§10 — Sectoral Integration	May integrate with existing EU-law RMS	RMS DataStore designed as composable — plugs into DORA/MDR/GDPR audit systems

08 Roadmap

Three-Phase Build Plan

01Weeks 1–4

Foundation — Risk State & Taxonomy

Extend your LangGraph StateGraph with Article9State. Author the HazardTaxonomy YAML for your specific agent domain — mapping hazard classes to fundamental rights vectors, likelihood priors, and mitigation controls. Wire up RiskEventCallback to an append-only event store. At the end of Phase 1, your agent produces a structured risk event log on every run. This alone satisfies §1 (documented, maintained) and §2(a/b) (identification/estimation).

Article9State schema HazardTaxonomy v1.0 RiskEventCallback RMS DataStore schema

02Weeks 5–8

Enforcement — Scoring, Thresholds & Eval Gates

Build the ResidualRiskScorer node and wire it into your graph's conditional edges. Define acceptability thresholds per hazard class in risk_thresholds.yaml. Build the RegulatoryEvalSuite in LangSmith with HarmfulOutputRate, FundamentalRightsProxy, and BiasProbe metrics. Integrate the suite as a required CI/CD gate — deployment to production is blocked if any threshold is breached. Document all test runs as §6/§7 evidence artifacts.

ResidualRiskScorer node risk_thresholds.yaml RegulatoryEvalSuite CI/CD deployment gate

03Weeks 9–12

Lifecycle Closure — Post-Market Monitoring & Documentation

Deploy the PostMarketMonitor as an async service. Configure drift detection thresholds per hazard class. Build the risk re-evaluation pipeline that triggers when drift is detected and writes timestamped re-evaluation reports to the RMS store. Generate a technical documentation bundle (Article 11) from RMS data — this is your compliance evidence package. Finally, validate §10 integration if other sectoral EU law applies: map RMS events into GDPR DPIA, DORA ICT risk log, or MDR QMS as appropriate.

PostMarketMonitor service drift_detector.py RMS Documentation Bundle §10 sectoral integration

📌 Scope Note

This proposal addresses Article 9 in isolation. A full high-risk AI system compliance program also requires: Article 10 (data governance), Article 11 (technical documentation), Article 12 (automatic logging), Article 13 (transparency), Article 14 (human oversight), and Article 15 (accuracy/cybersecurity). The components proposed here — particularly RiskEventCallback, the RMS DataStore, and the RegulatoryEvalSuite — are intentionally designed as foundations that future Articles 10–15 work can build on. This is not legal advice.

EU AI ACTARTICLE 9FOR LANGGRAPH

What Article 9 Actually Requires

Risk Management System Must Be Documented

Four-Step Iterative Loop

Only Mitigable Risks In Scope

Consider Combined Requirements

Residual Risk Must Be Judged Acceptable

Pre-Deployment Testing Against Defined Metrics

Testing at Any Point + Pre-Launch Mandatory

Continuous Feedback Loop from Live Data

Special Consideration: Minors and Vulnerable Users

May Merge With Existing RMS Under Other EU Law

Why LangGraph/LangChain Don't Natively Comply

No Persistent Risk State

No Audit Trail Standard

No Pre-Defined Risk Metrics

No Post-Market Feedback Loop

No Graph-Level Risk Composition

Human-in-the-Loop Hooks Exist

The Proposed Compliance Architecture

Four Pillars of the Solution

Component-Level Build Specification

RiskStateSchema

RiskEventCallback

HazardTaxonomy

ResidualRiskScorer

RegulatoryEvalSuite

PostMarketMonitor

Skeleton: Risk-Aware LangGraph State

Article 9 Paragraph → Solution Component Mapping

Three-Phase Build Plan

Foundation — Risk State & Taxonomy

Enforcement — Scoring, Thresholds & Eval Gates

Lifecycle Closure — Post-Market Monitoring & Documentation

EU AI ACT
ARTICLE 9
FOR LANGGRAPH